Skip to content

Streamed batched indexer with dedupe and max-index limit#932

Merged
multiplex55 merged 2 commits intomasterfrom
codex/refactor-indexer-for-streaming-api
Mar 15, 2026
Merged

Streamed batched indexer with dedupe and max-index limit#932
multiplex55 merged 2 commits intomasterfrom
codex/refactor-indexer-for-streaming-api

Conversation

@multiplex55
Copy link
Copy Markdown
Owner

Motivation

  • Avoid building one giant Vec<Action> up front when indexing large trees to reduce peak memory usage and startup latency.
  • Allow the UI and main startup/reload flows to receive and merge indexed actions progressively instead of waiting for a full scan to finish.
  • Protect the process from unbounded memory growth and duplicate entries by adding canonical-path deduplication and a configurable max item cap.

Description

  • Introduced IndexOptions and IndexBatchIter and a new API index_paths_batched(paths, options) that yields anyhow::Result<Vec<Action>> batches for streaming consumption. A compatibility helper index_paths exhausts the iterator for callers that need a Vec<Action>.
  • Implemented canonical path deduplication using HashSet<PathBuf> and enforced a max_items cap while iterating; defaults are provided via IndexOptions::default() and helper IndexOptions::with_max_items(...).
  • Updated startup indexing in src/main.rs to consume batches from index_paths_batched with IndexOptions::with_max_items(settings.max_indexed_items) and merge batches into actions_vec progressively.
  • Updated action reload handling in src/gui/mod.rs to iterate batches from index_paths_batched, extend acts per-batch, and refresh self.actions, update_action_cache() and search() progressively as batches arrive, while still finalizing state at the end of the reload.
  • Added Settings.max_indexed_items: Option<usize> and preserved it in SettingsEditor::to_settings so users can control the indexing cap via settings.
  • Added tests in tests/indexer.rs to verify canonicalized output, deduplication across duplicate roots, batch sizing and honoring max_items.

Testing

  • Ran cargo fmt which completed successfully.
  • Added unit tests under tests/indexer.rs covering recursive indexing, canonicalized/display paths, batching, dedupe and max_items behavior (new tests included in the commit).
  • Attempted to run cargo test tests::indexer -- --nocapture and cargo test --test indexer --no-default-features, but the local build was blocked by a missing system dependency required by alsa-sys (missing alsa.pc / pkg-config), so the test run failed to complete in this environment.

Codex Task

@multiplex55 multiplex55 merged commit 3d56184 into master Mar 15, 2026
1 check passed
@multiplex55 multiplex55 deleted the codex/refactor-indexer-for-streaming-api branch March 25, 2026 23:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant